Skip to content

Revert: Remove chat feature (accidental direct push)#206

Merged
NotYuSheng merged 1 commit into
devfrom
revert-chat-removal-from-dev
Sep 30, 2025
Merged

Revert: Remove chat feature (accidental direct push)#206
NotYuSheng merged 1 commit into
devfrom
revert-chat-removal-from-dev

Conversation

@NotYuSheng

@NotYuSheng NotYuSheng commented Sep 30, 2025

Copy link
Copy Markdown
Owner

User description

Reverting the accidental direct push to dev. The proper workflow is to create a PR from a feature branch.

This PR reverts commit 7b3c22e which was accidentally pushed directly to dev instead of through a feature branch PR.

The correct PR for removing chat will be created from the feat/remove-chat-feature branch.


PR Type

Enhancement, Tests, Documentation


Description

Restores complete chat service functionality - Reverts accidental removal of comprehensive RAG-based chat system
Implements RAG query processing - Adds query validation, document retrieval with ChromaDB, and LLM-based response generation
Adds streaming chat UI - Creates interactive chat interface with document selection, preset queries, and word-by-word streaming
Provides comprehensive testing - Includes 206+ unit tests across chat service components and models
Configures full deployment infrastructure - Adds Helm charts, Kubernetes templates, network policies, and RBAC configuration
Integrates with existing services - Updates PDF processor, frontend, and infrastructure components for chat service communication
Adds security and monitoring - Includes Trivy security scanning, Prometheus metrics, and zero-trust network policies
Updates documentation - Revises architecture diagrams, deployment guides, and service references


Diagram Walkthrough

flowchart LR
  A["Accidental Direct Push"] --> B["Revert Commit"]
  B --> C["Chat Service Core"]
  B --> D["Frontend Chat UI"]
  B --> E["Infrastructure Config"]
  
  C --> C1["RAG Processing"]
  C --> C2["Query Validation"]
  C --> C3["ChromaDB Integration"]
  
  D --> D1["Streaming Interface"]
  D --> D2["Document Selection"]
  D --> D3["Preset Queries"]
  
  E --> E1["Helm Charts"]
  E --> E2["Network Policies"]
  E --> E3["RBAC Configuration"]
Loading

File Walkthrough

Relevant files
Enhancement
13 files
rag_config.py
Add comprehensive RAG configuration and optimization system

chat_service/models/rag_config.py

• Added comprehensive RAG configuration system with query validation,
classification, and optimization
• Implemented EnhancedQueryValidator
with validation examples and prompt generation
• Created RAGConfig
class for model parameters and settings management
• Added
QueryClassificationExamples and PromptTemplates for different query
types
• Implemented RAGOptimizer with LLM-based query classification
and response post-processing

+569/-0 
chat.py
Implement chat router with RAG query processing                   

chat_service/routers/chat.py

• Implemented complete chat endpoint with RAG query processing
• Added
query validation using LLM before performing RAG operations

Integrated ChromaDB for document retrieval with reranking capabilities

• Added session-based document access control and filtering

Implemented response generation with configurable post-processing

+385/-0 
5_chat_UI.py
Add chat UI with document selection and streaming               

frontend/my_pages/5_chat_UI.py

• Created complete chat interface with document selection and preset
queries
• Implemented streaming response display with configurable
word-by-word effect
• Added preset buttons for summarization, main
topic, and key findings
• Integrated session management and document
filtering for chat requests

+250/-0 
main.py
Integrate chat service into frontend navigation                   

frontend/main.py

• Added CHAT_URL environment variable configuration
• Integrated chat
UI page into the main navigation structure
• Updated CSS styling for
chat container components

+9/-1     
process.py
Add semantic embedder support to processing pipeline         

pdf_processor_service/utils/process.py

• Added wait_for_semantic_embedder function for semantic embedding
processing
• Updated process_file_basic to run semantic and sentence
embedders concurrently
• Imported load_or_create_semantic_embedder_job
function

+9/-2     
chat.py
Add chat router with session-based access control               

pdf_processor_service/routers/chat.py

• Created chat router with session validation and document access
control
• Implemented request proxying to chat service with session ID
injection
• Added validation for document access permissions per
session

+38/-0   
chat.py
Add chat data models for PDF processor service                     

pdf_processor_service/models/chat.py

• Defined ChatRequest and ChatResponse Pydantic models
• Added
collection name constants for semantic and textual embeddings

Configured default collection to use semantic embeddings

+23/-0   
chat.py
Add chat data models for chat service                                       

chat_service/models/chat.py

• Created ChatRequest model with session ID and document ID support

Implemented ChatResponse model with relevant chunks and metadata

Added field validation and default values for collection names

+25/-0   
main.py
Integrate chat router into PDF processor service                 

pdf_processor_service/main.py

• Added chat router to the main FastAPI application
• Imported and
included chat router in the application routing

+2/-1     
10_settings_UI.py
Add chat streaming settings to UI preferences                       

frontend/my_pages/10_settings_UI.py

• Added chat settings section with streaming toggle control

Implemented user preference for word-by-word streaming effect
• Added
help text explaining the streaming functionality

+9/-0     
main.py
Create main FastAPI application for chat service                 

chat_service/main.py

• Created main FastAPI application for chat service
• Added Prometheus
metrics instrumentation
• Configured logging and included health and
chat routers

+19/-0   
process_pdf.py
Add chat service to backend health monitoring                       

frontend/components/process_pdf.py

• Added Chat Service to backend status monitoring
• Included CHAT_URL
environment variable in service health checks

+1/-0     
health.py
Add health check endpoint for chat service                             

chat_service/routers/health.py

• Implemented basic health check endpoint
• Added simple status
response for service monitoring

+8/-0     
Tests
5 files
test_chat.py
Add comprehensive unit tests for chat service                       

chat_service/tests/test_chat.py

• Added comprehensive unit tests for chat functionality
• Implemented
tests for retrieval result processing and chunk reranking
• Added
tests for RAG query execution and error handling
• Created tests for
chat endpoint with various scenarios including validation failures

+300/-0 
test_models.py
Add unit tests for chat data models                                           

chat_service/tests/test_models.py

• Added unit tests for ChatRequest and ChatResponse Pydantic models

Implemented validation tests for required fields and optional
parameters
• Added tests for model creation with various input
scenarios

+79/-0   
conftest.py
Add test configuration with mocked dependencies                   

chat_service/tests/conftest.py

• Created test configuration with mocked external dependencies
• Set
up environment variables for unit testing
• Added fixtures for mocking
OpenAI and ChromaDB clients

+45/-0   
__init__.py
Initialize tests package for chat service                               

chat_service/tests/init.py

• Created tests package initialization file
• Added package marker for
chat service test modules

+1/-0     
test-connection.yaml
Add Helm test template for chat service                                   

helm/chat-service/templates/tests/test-connection.yaml

• New Helm test template for chat service connectivity
• Defines
health check test using busybox wget
• Enables automated testing of
service availability

+15/-0   
Configuration changes
39 files
deploy-helm-charts.sh
Add chat service to Helm deployment scripts                           

scripts/deploy-helm-charts.sh

• Added chat-service to supported services list
• Updated examples and
documentation to reference chat-service
• Included chat-service in
deployment order for all services

+7/-6     
load-images.sh
Update image loading examples for chat service                     

helm/load-images.sh

• Updated examples to reference chat_service instead of
pdf_extraction_service
• Modified documentation and help text for chat
service image loading

+5/-5     
create-secrets.sh
Add chat service to secret creation script                             

create-secrets.sh

• Added chat-service to secret creation mapping
• Updated secret
listing pattern to include chat-service-secrets

+4/-2     
scan_with_trivy.sh
Add chat service to security scanning pipeline                     

scripts/scan_with_trivy.sh

• Added chat_service to the list of services for security scanning

Included chat service in Trivy vulnerability assessment

+2/-1     
test-all-services.sh
Add chat service to testing pipeline                                         

scripts/test-all-services.sh

• Added chat_service to the list of services with unit tests

Included chat service in comprehensive testing pipeline

+2/-1     
test-single-service.sh
Add chat service to single service testing                             

scripts/test-single-service.sh

• Added chat_service to supported services list for individual testing

• Enabled single service test execution for chat service

+1/-0     
_helpers.tpl
Add Helm template helpers for chat service                             

helm/chat-service/templates/_helpers.tpl

• Created Helm template helpers for chat-service deployment
• Added
standard Kubernetes naming and labeling functions
• Implemented
service account and secret name generation helpers

+73/-0   
Makefile
Update Makefile examples for chat service                               

Makefile

• Updated examples and documentation to reference chat-service

Modified help text and usage examples for chat service deployment

Changed example service names from pdf-extraction-service to
chat-service

+6/-6     
values-prestaging.yaml
Add chat service to Istio gateway configuration                   

helm/istio-gateway/values-prestaging.yaml

• Added chat-service destination rule for Istio service mesh

Configured mutual TLS mode for chat service communication

+4/-0     
NOTES.txt
Add Helm chart notes template for chat service                     

helm/chat-service/templates/NOTES.txt

• New Helm chart notes template for chat service deployment
• Includes
service access instructions, health checks, and testing commands

Provides monitoring and network security information

+61/-0   
values.yaml
Add Helm values configuration for chat service                     

helm/chat-service/values.yaml

• New Helm values configuration for chat service
• Defines LLM
configuration, resources, networking policies, and service settings

Includes Istio sidecar injection and monitoring configuration

+122/-0 
values-prestaging.yaml
Add prestaging Helm values for chat service                           

helm/chat-service/values-prestaging.yaml

• Prestaging-specific Helm values for chat service
• Configures
minimal resources, external vLLM integration, and network policies

Includes disk pressure tolerations and Istio configuration

+104/-0 
deployment.yaml
Add Kubernetes deployment template for chat service           

helm/chat-service/templates/deployment.yaml

• New Kubernetes deployment template for chat service
• Defines
container configuration, environment variables, and resource limits

Includes LLM configuration and secret management

+92/-0   
networkpolicy.yaml
Add NetworkPolicy template for chat service                           

helm/chat-service/templates/networkpolicy.yaml

• New NetworkPolicy template for chat service zero-trust security

Defines ingress rules for PDF processor service communication

Configures egress rules for ChromaDB, MinIO, Redis, and external LLM
access

+66/-0   
servicemonitor.yaml
Add ServiceMonitor template for chat service monitoring   

helm/chat-service/templates/servicemonitor.yaml

• New ServiceMonitor template for Prometheus metrics collection

Configures scraping intervals, timeouts, and metric relabeling

Enables monitoring integration for chat service

+41/-0   
Dockerfile
Add Dockerfile for chat service container                               

chat_service/Dockerfile

• New Dockerfile for chat service container image
• Configures Python
3.13 environment with FastAPI and dependencies
• Includes security
hardening with non-root user

+36/-0   
values.yaml
Add RBAC configuration for chat service                                   

helm/rbac/values.yaml

• Added chat-service to service accounts list
• Added chat-service
RBAC configuration with ChromaDB, MinIO, and Redis access
• Updated
pdf-processor-service to include chat-service in canCall list

+12/-0   
hpa.yaml
Add HPA template for chat service autoscaling                       

helm/chat-service/templates/hpa.yaml

• New HorizontalPodAutoscaler template for chat service
• Configures
CPU and memory-based autoscaling
• Enables dynamic scaling based on
resource utilization

+32/-0   
poddisruptionbudget.yaml
Add PodDisruptionBudget template for chat service               

helm/chat-service/templates/poddisruptionbudget.yaml

• New PodDisruptionBudget template for chat service high availability

• Configures minimum available replicas during cluster maintenance

Ensures service availability during rolling updates

+23/-0   
example.env
Add environment configuration for chat service                     

chat_service/example.env

• New environment configuration file for chat service
• Defines LLM
settings, RAG optimization parameters, and Redis connection
• Includes
model configuration and context management settings

+23/-0   
example.env
Update image captioner service environment configuration 

image_captioner_service/example.env

• Updated OPENAI_BASE_URL from IP address to hostname
• Changed
OPENAI_API_KEY from token format to lm-studio

+2/-2     
docker-compose.yml
Add chat service to Docker Compose configuration                 

docker-compose.yml

• Added chat_service container configuration
• Configured build
context and environment file
• Added dependency on embedder_service

+10/-0   
images.txt
Add chat service image to Helm images list                             

helm/images.txt

• Added ghcr.io/notyusheng/chat_service:dev-v0.0.3-5d69f89 to image
list
• Updated core application services section

+1/-0     
Chart.yaml
Add Helm chart metadata for chat service                                 

helm/chat-service/Chart.yaml

• New Helm chart metadata for chat service
• Defines chart version,
description, and keywords
• Configures chart as application type with
RAG capabilities

+18/-0   
service.yaml
Add Kubernetes Service template for chat service                 

helm/chat-service/templates/service.yaml

• New Kubernetes Service template for chat service
• Configures
ClusterIP service with port 8000
• Defines service selector and port
mapping

+15/-0   
serviceaccount.yaml
Add ServiceAccount template for chat service                         

helm/chat-service/templates/serviceaccount.yaml

• New ServiceAccount template for chat service
• Configures service
account creation and token mounting
• Includes metadata labels and
annotations support

+13/-0   
values-prestaging.yaml
Update ChromaDB network policy for chat service access     

helm/chromadb/values-prestaging.yaml

• Added chat-service to NetworkPolicy ingress allowedCallers
• Updated
network policy to allow chat service access to ChromaDB

+4/-0     
values.yaml
Update ChromaDB network policy for chat service access     

helm/chromadb/values.yaml

• Added chat-service to NetworkPolicy ingress allowedCallers
• Updated
network policy to allow chat service access to ChromaDB

+4/-0     
example.env
Add chat service URL to PDF processor configuration           

pdf_processor_service/example.env

• Added CHAT_SERVICE_URL=http://chat_service:8000/chat environment
variable
• Updated service URL configuration to include chat service
endpoint

+1/-0     
values-prestaging.yaml
Update PDF processor network policy for chat service         

helm/pdf-processor-service/values-prestaging.yaml

• Added chat-service to NetworkPolicy egress allowedTargets
• Updated
network policy to allow PDF processor to communicate with chat service

+5/-0     
values.yaml
Update PDF processor network policy for chat service         

helm/pdf-processor-service/values.yaml

• Added chat-service to NetworkPolicy egress allowedTargets
• Updated
network policy to allow PDF processor to communicate with chat service

+5/-0     
values.yaml
Add chat service to Istio gateway destination rules           

helm/istio-gateway/values.yaml

• Added chat-service.omnipdf-prestaging.svc.cluster.local to
destination rules
• Configured mTLS traffic policy for chat service

+4/-0     
values-prestaging.yaml
Update Redis network policy for chat service access           

helm/redis/values-prestaging.yaml

• Added chat-service to NetworkPolicy ingress allowedCallers
• Updated
network policy to allow chat service access to Redis

+4/-0     
values-prestaging.yaml
Update MinIO network policy for chat service access           

helm/minio/values-prestaging.yaml

• Added chat-service to NetworkPolicy ingress allowedCallers
• Updated
network policy to allow chat service access to MinIO

+4/-0     
example.env
Add chat service URL to frontend configuration                     

frontend/example.env

• Added CHAT_URL=http://chat_service:8000 environment variable

Updated frontend configuration to include chat service endpoint

+1/-0     
values.yaml
Update Redis network policy for chat service access           

helm/redis/values.yaml

• Added chat-service to NetworkPolicy ingress allowedCallers
• Updated
network policy to allow chat service access to Redis

+4/-0     
values.yaml
Update MinIO network policy for chat service access           

helm/minio/values.yaml

• Added chat-service to NetworkPolicy ingress allowedCallers
• Updated
network policy to allow chat service access to MinIO

+4/-0     
example.env
Add chat service URL to nginx configuration                           

nginx/example.env

• Added CHAT_URL=http://chat_service:8001 environment variable

Updated nginx configuration to include chat service endpoint

+1/-0     
services-to-build.txt
Add chat service to GitHub Actions build list                       

.github/services-to-build.txt

• Added chat_service to the list of core application services to build

• Updated CI/CD build configuration to include chat service

+1/-0     
Miscellaneous
1 files
chat_service-report.txt
Add security scan report for chat service                               

trivy_scan_results/chat_service-report.txt

• Generated security scan report for chat service Docker image
• Shows
clean scan results with no high or critical vulnerabilities
• Includes
comprehensive package and filesystem vulnerability assessment

+371/-0 
Documentation
9 files
README.md
Restore chat service references in main documentation       

README.md

• Added chat service to the architecture description and microservices
list
• Updated service count from 13 to 14 individual RBAC roles

Added chat-service to network policy tables and deployment examples

Updated test count from 180+ to 206+ tests across 7 services

+21/-20 
SECRET-MANAGEMENT.md
Update secret management examples for chat service             

helm/SECRET-MANAGEMENT.md

• Updated examples to use chat-service instead of
pdf-extraction-service
• Changed secret references from
pdf-extraction-service-secrets to chat-service-secrets
• Modified
command examples and service references throughout

+25/-25 
LOCAL_IMAGE_REPOSITORY.md
Update local image repository examples for chat service   

helm/LOCAL_IMAGE_REPOSITORY.md

• Replaced pdf_extraction_service references with chat_service in
examples
• Updated Docker image names and repository paths
• Modified
deployment and configuration examples

+21/-21 
c4-diagram.puml
Add chat service to C4 architecture diagram                           

c4-diagram.puml

• Added chat_service container definition with RAG capabilities

Updated frontend description to include chat interface
• Added
relationships between chat service and other components (ChromaDB,
Redis, MinIO, vLLM)

+8/-2     
README.md
Update testing documentation for chat service                       

scripts/README.md

• Updated examples to use chat_service instead of
pdf_extraction_service
• Changed service count from 6 to 7 services in
testing documentation
• Updated test output examples and service
descriptions

+18/-18 
README.md
Update RBAC documentation for chat service                             

helm/rbac/README.md

• Updated service coverage from 13/13 to 14/14 complete
• Added
chat-service to service table and data access patterns
• Updated
ChromaDB access patterns to include chat service

+12/-9   
NETWORK-POLICY-REFERENCE.md
Update network policy reference for chat service                 

helm/NETWORK-POLICY-REFERENCE.md

• Added chat-service to external AI communication table
• Updated
service references and communication patterns
• Modified ChromaDB
access patterns to include chat service

+4/-4     
INSTALL.md
Update Istio gateway installation for chat service             

helm/istio-gateway/INSTALL.md

• Added chat-service to service deployment loop
• Updated service
communication testing examples
• Modified deployment instructions to
include chat service

+2/-2     
Chart.yaml
Update frontend chart description for chat functionality 

helm/frontend/Chart.yaml

• Updated chart description to include "and chat" functionality

Modified description from PDF processing only to include chat
interface

+1/-1     
Dependencies
1 files
requirements.txt
Add Python requirements for chat service                                 

chat_service/requirements.txt

• New Python dependencies file for chat service
• Includes FastAPI,
OpenAI, ChromaDB, and monitoring libraries
• Defines specific versions
for all required packages

+8/-0     
Additional files
4 files
__init__.py [link]   
__init__.py [link]   
values-prod.yaml [link]   
values-staging.yaml [link]   

@NotYuSheng NotYuSheng merged commit 527487e into dev Sep 30, 2025
5 checks passed
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @NotYuSheng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request is a revert of a previous commit that accidentally removed the chat feature by being pushed directly to the dev branch. The primary purpose of this revert is to restore the chat feature, ensuring that its removal (which will happen in a future PR) follows the proper development workflow through a dedicated feature branch pull request. This action temporarily re-adds the chat functionality that was prematurely removed.

Highlights

  • Revert Accidental Push: This pull request reverts a previous commit that was accidentally pushed directly to the dev branch, which had removed the chat_service.
  • Re-introduction of Chat Service: The chat_service and all its associated files, including its Dockerfile, environment variables, FastAPI application, Pydantic models, RAG optimization logic, and unit tests, are re-introduced.
  • Configuration and Documentation Updates: Various configuration files and documentation, such as Makefile, README.md, Helm charts, docker-compose.yml, frontend UI, and scripts, are updated to correctly integrate the chat_service back into the system.
  • Restoration of Service Integration: The chat_service's presence is restored in build processes, deployment configurations, network policies, and RBAC roles across the project.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codiumai-pr-agent-free

Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Error Handling

The chat endpoint doesn't properly handle the case when doc_ids is None but the validation requires checking each doc_id. This could lead to runtime errors if doc_ids is None.

OPENAI_MODEL_NAME = rag_config.model_name
Error Handling

The chat_with_rag function has a max_retries of 600 with 1 second delay, which could lead to extremely long waiting times (10 minutes) before timing out. Consider a more reasonable timeout strategy.

async def chat_with_rag(prompt: str, doc_ids: list[str] = None, collection_name: str = "SemanticEmbeds", max_retries: int = 600, delay: int = 1):
    """
Performance Concern

The post_process_llm_response function uses a potentially inefficient approach to remove duplicate lines by comparing each line with the previous one, which could be slow for very large responses.

def post_process_llm_response(response: str) -> str:
    """Post-process LLM response for better formatting"""

    # Remove only consecutive duplicate lines to avoid breaking formatted content.
    lines = response.split("\n")
    if not lines:
        return ""

    filtered_lines = [lines[0]]
    for i in range(1, len(lines)):
        current_line_stripped = lines[i].strip()
        prev_line_stripped = lines[i - 1].strip()
        if not current_line_stripped or current_line_stripped != prev_line_stripped:
            filtered_lines.append(lines[i])

    cleaned_response = "\n".join(filtered_lines)

    # Ensure response ends properly
    if cleaned_response and not cleaned_response.rstrip().endswith(
        (".", "!", "?", ":")
    ):
        cleaned_response = cleaned_response.rstrip() + "."

    return cleaned_response.strip()

@codiumai-pr-agent-free

Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix JSON decoding error in frontend

In the frontend chat UI, parse the response from the chat service using
json.loads(response.text) instead of response.json() to fix a JSON decoding
error.

frontend/my_pages/5_chat_UI.py [61-68]

 try:
-    logger.info(f"Decode JSON from response: {response}")
+    logger.info(f"Decode JSON from response: {response.text}")
     logger.info(f"Decode JSON from headers: {response.headers}")
-    data = response.json()
+    data = json.loads(response.text)
 except json.JSONDecodeError as e:
     logger.error(f"Failed to decode JSON from response: {response.text}: {e}")
     server_status.error("Received an invalid response from the server.")
     return "Sorry, I received an invalid response from the server."
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a double JSON encoding issue due to the proxy layer, which would cause a runtime JSONDecodeError and break the chat functionality.

High
Fix broken external service connectivity

Add a missing egress rule for HTTP (port 80) to the networkpolicy.yaml template
to allow connectivity to the LLM service, which is configured with an HTTP
endpoint.

helm/chat-service/templates/networkpolicy.yaml [45-50]

+{{- if .Values.networkPolicy.egress.allowHTTP }}
+- to: []
+  ports:
+    - protocol: TCP
+      port: 80
+{{- end }}
 {{- if .Values.networkPolicy.egress.allowHTTPS }}
 - to: []
   ports:
     - protocol: TCP
       port: 443
 {{- end }}
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical bug where the network policy blocks HTTP egress traffic, preventing the service from connecting to its configured LLM endpoint and rendering it non-functional.

High
Fix inconsistent service port configuration

Standardize the chat_service port in nginx/example.env from 8001 to 8000 to
match other configurations and prevent connection errors.

nginx/example.env [2]

-CHAT_URL=http://chat_service:8001
+CHAT_URL=http://chat_service:8000

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical port inconsistency for the chat_service across different configuration files, which would lead to a runtime failure of the service.

High
General
Use default factory for list field

In the ChatRequest model, change the default for doc_ids from None to an empty
list by using Field(default_factory=list).

chat_service/models/chat.py [5-13]

 class ChatRequest(BaseModel):
     """
     Request model for chat API endpoints.
     """
 
     message: str
     session_id: str
-    doc_ids: List[str] = None
+    doc_ids: List[str] = Field(default_factory=list)
     collection_name: str = Field(default="default_collection", description="ChromaDB collection name")
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out that using default_factory=list is better practice for mutable defaults in Pydantic models, preventing potential TypeError exceptions.

Medium
Generalize default Helm chart values

Generalize the default values.yaml by removing hardcoded, environment-specific
values and replacing them with generic defaults or placeholders to improve chart
reusability.

helm/chat-service/values.yaml [1-31]

-# Pre-staging environment overrides - minimal for disk pressure
+# Default values for chat-service.
+# This is a YAML-formatted file.
+# Declare variables to be passed into your templates.
 replicaCount: 1
 
-# Pre-staging LLM configuration - External vLLM service on VPN
 llm:
-  baseUrl: "http://100.108.2.57/vllm_qwen2.5/v1"
-  model: "Qwen2.5-14B-Coder-Instruct"  # Model name as served by external vLLM
+  baseUrl: "" # e.g., http://my-vllm-service/v1
+  model: ""   # e.g., Qwen2.5-14B-Coder-Instruct
 
-# Pre-staging resources - minimal for disk pressure
 resources:
+  limits:
+    cpu: 500m
+    memory: 512Mi
+  requests:
+    cpu: 200m
+    memory: 256Mi
 ...
-# Pre-staging specific configurations - using local CRC registry
 image:
-  repository: default-route-openshift-image-registry.apps-crc.testing/omnipdf/chat_service
-  tag: "dev-v0.0.3-5d69f89"
-  pullPolicy: Always  # Use local registry image
+  repository: omnipdf/chat_service
+  tag: "" # Overridden in environment-specific values files
+  pullPolicy: IfNotPresent
 ...

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out that the default values.yaml is polluted with environment-specific values, which is against Helm best practices and harms reusability.

Medium
Rename test for clarity and accuracy

Rename the test test_invalid_chat_response_no_metadata to be more specific,
clarifying it tests the ChatResponse model from the chat_service.

chat_service/tests/test_models.py [76-79]

-def test_invalid_chat_response_no_metadata(self):
-    """Test ChatResponse validation without metadata"""
+def test_chat_service_invalid_response_no_metadata(self):
+    """Test ChatResponse validation without metadata for chat_service"""
     with pytest.raises(ValidationError):
         ChatResponse(response="AI response")
  • Apply / Chat
Suggestion importance[1-10]: 3

__

Why: The suggestion correctly identifies a potentially confusing test name and proposes a more descriptive one, which improves test suite clarity and maintainability.

Low
Security
Optimize Docker image with multi-stage build

Optimize the Docker image by implementing a multi-stage build to remove
build-time dependencies, resulting in a smaller and more secure final image.

chat_service/Dockerfile [7-25]

+# --- Builder Stage ---
+FROM python:3.13-slim as builder
+
+WORKDIR /app
+
 RUN apt-get update && apt-get install -y build-essential && rm -rf /var/lib/apt/lists/*
 
-# Copy requirements.txt from chat_service folder in root context
 COPY chat_service/requirements.txt .
+RUN pip install --no-cache-dir --prefix="/install" -r requirements.txt
 
-RUN pip install --no-cache-dir -r requirements.txt
+# --- Final Stage ---
+FROM python:3.13-slim
 
-# Copy utils folder from root context into /app/utils
+WORKDIR /app
+
+ENV PYTHONPATH=/app/chat_service:/app/shared_utils
+
+COPY --from=builder /install /usr/local
 COPY shared_utils ./shared_utils
-
-# Copy chat_service folder from root context into /app/chat_service
 COPY chat_service ./chat_service
 
-# Remove unnecessary system packages
-RUN apt-get update && \
-    DEBIAN_FRONTEND=noninteractive apt-get remove --purge -y linux-libc-dev && \
-    apt-get autoremove -y && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/*
+# Create non-root user for security and set permissions
+RUN groupadd -r appuser && useradd -r -g appuser -d /app -s /bin/bash appuser && \
+    chown -R appuser:appuser /app
 
+USER appuser
+
+# Expose the FastAPI port
+EXPOSE 8000
+
+CMD ["sh", "-c", "PYTHONPATH=/app/chat_service uvicorn chat_service.main:app --host 0.0.0.0 --port 8000"]
+
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion proposes a multi-stage build to create a smaller and more secure Docker image, which is a valuable best practice for production environments.

Low
  • More

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reverts an accidental commit and correctly re-introduces the chat feature. This is a substantial addition, including a new chat_service, frontend UI components, extensive RAG-based processing logic, and all the necessary infrastructure and configuration changes. The implementation is comprehensive and well-structured. My review focuses on several critical bugs in the tests that would prevent them from passing, a potential runtime bug in the Dockerfile configuration, and several medium-severity suggestions to improve maintainability, robustness, and consistency across the new services.

Comment thread chat_service/Dockerfile
Comment on lines +5 to +36
ENV PYTHONPATH=/app/chat_service:/app/shared_utils

RUN apt-get update && apt-get install -y build-essential && rm -rf /var/lib/apt/lists/*

# Copy requirements.txt from chat_service folder in root context
COPY chat_service/requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

# Copy utils folder from root context into /app/utils
COPY shared_utils ./shared_utils

# Copy chat_service folder from root context into /app/chat_service
COPY chat_service ./chat_service

# Remove unnecessary system packages
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get remove --purge -y linux-libc-dev && \
apt-get autoremove -y && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

# Create non-root user for security and set permissions
RUN groupadd -r appuser && useradd -r -g appuser -d /app -s /bin/bash appuser && \
chown -R appuser:appuser /app

USER appuser

# Expose the FastAPI port
EXPOSE 8000

CMD ["sh", "-c", "PYTHONPATH=/app/chat_service uvicorn chat_service.main:app --host 0.0.0.0 --port 8000"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The CMD instruction is incorrectly overriding the PYTHONPATH environment variable, which will likely cause ImportError at runtime. Additionally, the Dockerfile can be optimized to reduce image size by combining RUN layers.

  1. CMD Bug: The ENV at line 5 correctly sets PYTHONPATH to include both chat_service and shared_utils. However, the CMD at line 36 re-defines PYTHONPATH for its execution scope but omits /app/shared_utils. This will prevent modules from shared_utils from being imported.

  2. Layer Optimization: The Dockerfile runs apt-get update twice (lines 7 and 21) and uses separate RUN commands to install and then remove packages. This creates unnecessary layers. These steps should be combined into a single RUN command to install build dependencies, use them to install Python packages, and then clean them up in the same layer.

I suggest refactoring the CMD to use the exec form and rely on the ENV variable, and consolidating the RUN commands for better efficiency and a smaller final image.

ENV PYTHONPATH=/app

RUN apt-get update && \
    apt-get install -y --no-install-recommends build-essential && \
    rm -rf /var/lib/apt/lists/*

# Copy requirements.txt from chat_service folder in root context
COPY chat_service/requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt && \
    apt-get purge -y --auto-remove build-essential && \
    rm -rf /var/lib/apt/lists/*

# Copy utils folder from root context into /app/utils
COPY shared_utils ./shared_utils

# Copy chat_service folder from root context into /app/chat_service
COPY chat_service ./chat_service

# Create non-root user for security and set permissions
RUN groupadd -r appuser && useradd -r -g appuser -d /app -s /bin/bash appuser && \
    chown -R appuser:appuser /app

USER appuser

# Expose the FastAPI port    
EXPOSE 8000

CMD ["uvicorn", "chat_service.main:app", "--host", "0.0.0.0", "--port", "8000"]

Comment on lines +130 to +175
result = await perform_rag_query(
query="test query",
collection_name="test_collection",
top_k=5,
openai_client=AsyncMock(),
)

user_prompt, chunks, system_prompt, query_type = result
assert "couldn't find any relevant information" in user_prompt
assert chunks == []
assert query_type == "general"

@pytest.mark.asyncio
async def test_successful_rag_query(self, sample_chroma_results):
"""Test successful RAG query execution"""
mock_chunks = [
{"content": "test content", "similarity_score": 0.9, "doc_id": "doc1"}
]

with (
patch("routers.chat.get_chroma_client") as mock_chroma,
patch("routers.chat.prepare_retrieval_results") as mock_prepare,
patch("routers.chat.rag_optimizer") as mock_optimizer,
patch("routers.chat.prompt_templates") as mock_templates,
):
mock_collection = AsyncMock()
mock_collection.query.return_value = sample_chroma_results
mock_chroma.return_value.get_collection.return_value = mock_collection
mock_prepare.return_value = mock_chunks

mock_optimizer.chunk_optimization.return_value = (
mock_chunks,
"test context",
)
# Make detect_query_type async mock
mock_optimizer.detect_query_type = AsyncMock(return_value="summary")

mock_templates.get_system_prompt.return_value = "system prompt"
mock_templates.format_user_prompt.return_value = "user prompt"

result = await perform_rag_query(
query="test query",
collection_name="test_collection",
top_k=5,
openai_client=AsyncMock(),
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The calls to perform_rag_query in TestPerformRAGQuery are missing the required session_id argument. The function signature for perform_rag_query requires session_id as a positional argument, so these tests will raise a TypeError and fail to run. Please add a mock session_id to these calls.

Comment on lines +6 to +42
class TestChatRequest:
def test_valid_chat_request(self):
"""Test creation of valid ChatRequest"""
request = ChatRequest(
message="What is this document about?", collection_name="test_collection"
)

assert request.message == "What is this document about?"
assert request.collection_name == "test_collection"
assert request.doc_id is None

def test_chat_request_with_doc_id(self):
"""Test ChatRequest with optional doc_id"""
request = ChatRequest(
message="What is this document about?",
collection_name="test_collection",
doc_id="doc123",
)

assert request.doc_id == "doc123"

def test_chat_request_default_collection(self):
"""Test ChatRequest with default collection name"""
request = ChatRequest(message="Test message")
assert request.collection_name == "default_collection"

def test_chat_request_empty_message_allowed(self):
"""Test ChatRequest allows empty message (Pydantic v2 behavior)"""
# In Pydantic v2, empty strings are valid unless explicitly constrained
request = ChatRequest(message="", collection_name="test")
assert request.message == ""
assert request.collection_name == "test"

def test_invalid_chat_request_no_message(self):
"""Test ChatRequest validation without message"""
with pytest.raises(ValidationError):
ChatRequest(collection_name="test")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The tests for the ChatRequest model are incorrect and will fail due to several issues:

  1. Missing session_id: The ChatRequest model requires a session_id, but it is not provided in any of the test instantiations, which will cause a ValidationError.
  2. Incorrect Attribute Name: The tests refer to doc_id (singular), but the model attribute is doc_ids (plural).
  3. Incorrect Attribute Type: The tests assign a string to doc_id, but the model expects doc_ids to be a List[str].

These tests need to be corrected to align with the ChatRequest model definition.

Comment on lines +1 to +569
from typing import List, Dict, Any, Optional
from openai import AsyncOpenAI, APIError
import os
import logging
from enum import Enum

logger = logging.getLogger(__name__)


class EnhancedQueryValidator:
"""Enhanced query validation system for LLM"""

def __init__(self):
self.validation_examples = self._get_validation_examples()

def _get_validation_examples(self) -> List[Dict[str, Any]]:
"""Get comprehensive examples for query validation"""
return [
# PROCEED_WITH_RAG examples
{
"query": "Describe the CEO's role in the company",
"decision": "PROCEED_WITH_RAG",
"reason": "Factual extraction from specific documents",
},
{
"query": "What are the main findings in the quarterly report?",
"decision": "PROCEED_WITH_RAG",
"reason": "Specific document content request",
},
{
"query": "How does the company's revenue compare to last year?",
"decision": "PROCEED_WITH_RAG",
"reason": "Analytical question requiring document data",
},
{
"query": "List all the safety protocols mentioned in the manual",
"decision": "PROCEED_WITH_RAG",
"reason": "Factual extraction from specific documents",
},
{
"query": "Summarize the key recommendations from the research paper",
"decision": "PROCEED_WITH_RAG",
"reason": "Summarization of document content",
},
# HANDLE_WITHOUT_RAG examples
{
"query": "What is machine learning?",
"decision": "HANDLE_WITHOUT_RAG",
"reason": "General knowledge question not requiring specific documents",
},
{
"query": "How do I calculate compound interest?",
"decision": "HANDLE_WITHOUT_RAG",
"reason": "General procedural knowledge",
},
{
"query": "Hello, how are you?",
"decision": "HANDLE_WITHOUT_RAG",
"reason": "Conversational greeting",
},
{
"query": "Can you help me brainstorm ideas for a presentation?",
"decision": "HANDLE_WITHOUT_RAG",
"reason": "Creative assistance not requiring document search",
},
# INVALID_QUERY examples
{
"query": "banana",
"decision": "INVALID_QUERY",
"reason": "Single word without context",
},
{
"query": "ajsdlkfj askdjf",
"decision": "INVALID_QUERY",
"reason": "Gibberish text",
},
{
"query": "???",
"decision": "INVALID_QUERY",
"reason": "No meaningful content",
},
# NEEDS_CLARIFICATION examples
{
"query": "Tell me about it",
"decision": "NEEDS_CLARIFICATION",
"reason": "Ambiguous reference - what is 'it'?",
},
{
"query": "What about the data?",
"decision": "NEEDS_CLARIFICATION",
"reason": "Vague reference - which data and what aspect?",
},
{
"query": "More information please",
"decision": "NEEDS_CLARIFICATION",
"reason": "No specific topic or context provided",
},
]

def get_enhanced_validation_prompt(
self, query: str, collection_info: Optional[str] = None
) -> str:
"""Generate enhanced validation prompt with context awareness"""

examples_text = "\n".join(
[
f'Query: "{ex["query"]}" -> {ex["decision"]} ({ex["reason"]})'
for ex in self.validation_examples
]
)

collection_context = ""
if collection_info:
collection_context = f"\nCollection Context: {collection_info}"

return f"""You are an intelligent query router for a document-based Q&A system using RAG (Retrieval Augmented Generation).

Your task is to analyze user queries and decide the best approach to handle them.

DECISION OPTIONS:
1. PROCEED_WITH_RAG - Query needs document search and retrieval
- Asks for specific information from documents
- Requires analysis of document content
- Seeks factual data, summaries, or insights from specific sources

2. HANDLE_WITHOUT_RAG - Query can be answered with general knowledge
- General knowledge questions
- Conversational queries
- Procedural or how-to questions not requiring specific documents
- Greetings and casual conversation

3. INVALID_QUERY - Query is meaningless or too vague
- Gibberish or random characters
- Single words without context
- Incomplete thoughts

4. NEEDS_CLARIFICATION - Query is ambiguous and needs more context
- Vague references without clear subjects
- Missing important context
- Ambiguous pronouns or references

EXAMPLES:
{examples_text}{collection_context}

INSTRUCTIONS:
Analyze the query considering:
1. Specificity - Is the query specific enough to be actionable?
2. Context dependency - Does it require specific document content?
3. Intent clarity - Is the user's intent clear?
4. Scope - Is it asking for general knowledge vs. specific document information?

Respond in this exact format:
DECISION: [PROCEED_WITH_RAG|HANDLE_WITHOUT_RAG|INVALID_QUERY|NEEDS_CLARIFICATION]
CONFIDENCE: [HIGH|MEDIUM|LOW]
REASON: [Brief explanation of your decision]
SUGGESTION: [If not PROCEED_WITH_RAG, suggest how to handle or what clarification is needed]

Query: "{query}"
"""


class QueryType(Enum):
"""Enumeration of supported query types"""

GENERAL = "general"
FACTUAL = "factual"
ANALYTICAL = "analytical"
SUMMARIZATION = "summarization"


class RAGConfig:
"""Configuration class for the model's RAG optimization"""

def __init__(self):
self.model_name = os.getenv("OPENAI_MODEL")

# LLM's Generation parameters optimized for RAG
self.generation_params = {
"temperature": float(os.getenv("MODEL_TEMPERATURE", "0.1")),
"max_tokens": int(os.getenv("MODEL_MAX_TOKENS", "2000")),
"top_p": float(os.getenv("MODEL_TOP_P", "0.8")),
"frequency_penalty": float(os.getenv("MODEL_FREQ_PENALTY", "0.1")),
"presence_penalty": float(os.getenv("MODEL_PRESENCE_PENALTY", "0.1")),
}

# Query parameters (lighter settings for query validation and classification)
self.validation_params = {
"temperature": 0.0,
"max_tokens": 500,
"top_p": float(os.getenv("MODEL_TOP_P", "0.8")),
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
}

# Context management
self.max_context_length = int(os.getenv("MODEL_MAX_CONTEXT", "4000"))

# RAG-specific settings
self.min_similarity_score = float(os.getenv("MODEL_MIN_SIMILARITY", "0.1"))
self.enable_reranking = (
os.getenv("MODEL_ENABLE_RERANKING", "true").lower() == "true"
)

# RAG Optimization flags
self.enable_llm_query_classification = (
os.getenv("ENABLE_LLM_QUERY_CLASSIFICATION", "true").lower() == "true"
)
self.enable_response_post_processing = (
os.getenv("ENABLE_RESPONSE_POST_PROCESSING", "true").lower() == "true"
)


class QueryClassificationExamples:
"""Few-shot examples for query type classification"""

@staticmethod
def get_classification_examples() -> List[Dict[str, str]]:
"""Returns comprehensive examples for each query type"""
return [
# Factual queries - seeking specific information, data, or facts
{"query": "What is the capital of France?", "type": "factual"},
{
"query": "When was the Declaration of Independence signed?",
"type": "factual",
},
{"query": "How many employees work at the company?", "type": "factual"},
{"query": "List all the ingredients in the recipe.", "type": "factual"},
{
"query": "What are the system requirements for the software?",
"type": "factual",
},
{"query": "Who is the CEO of the organization?", "type": "factual"},
{"query": "Define machine learning.", "type": "factual"},
{"query": "What is my shopping list?", "type": "factual"},
{"query": "Show me the sales figures for last quarter.", "type": "factual"},
# Analytical queries - requiring analysis, comparison, evaluation
{
"query": "Why did the stock price decline last month?",
"type": "analytical",
},
{
"query": "How does renewable energy compare to fossil fuels?",
"type": "analytical",
},
{
"query": "Analyze the pros and cons of remote work.",
"type": "analytical",
},
{
"query": "What are the implications of this policy change?",
"type": "analytical",
},
{
"query": "Evaluate the effectiveness of the marketing campaign.",
"type": "analytical",
},
{
"query": "How do these two products differ in performance?",
"type": "analytical",
},
{
"query": "Assess the risks associated with this investment.",
"type": "analytical",
},
{
"query": "What factors contributed to the project's success?",
"type": "analytical",
},
{
"query": "Examine the relationship between customer satisfaction and retention.",
"type": "analytical",
},
# Summarization queries - requesting condensed overview or summary
{
"query": "Summarize the main points of the report.",
"type": "summarization",
},
{
"query": "Give me an overview of the quarterly results.",
"type": "summarization",
},
{"query": "Summarize my shopping list.", "type": "summarization"},
{
"query": "Provide a summary of the meeting minutes.",
"type": "summarization",
},
{
"query": "What are the key takeaways from the research paper?",
"type": "summarization",
},
{
"query": "Condense the main findings of the survey.",
"type": "summarization",
},
{
"query": "Give me the highlights of the project status.",
"type": "summarization",
},
{
"query": "Summarize the customer feedback trends.",
"type": "summarization",
},
{
"query": "Provide an executive summary of the proposal.",
"type": "summarization",
},
# General queries - conversational, complex, or multi-faceted
{"query": "How can I improve my productivity?", "type": "general"},
{"query": "Tell me about artificial intelligence.", "type": "general"},
{"query": "What should I consider when buying a car?", "type": "general"},
{"query": "Help me understand this concept better.", "type": "general"},
{"query": "Can you explain how this process works?", "type": "general"},
{
"query": "What are some best practices for project management?",
"type": "general",
},
{"query": "I need advice on career development.", "type": "general"},
{"query": "How do I troubleshoot this technical issue?", "type": "general"},
{
"query": "What would you recommend for this situation?",
"type": "general",
},
]


class PromptTemplates:
"""Specialized prompt templates for different types of RAG queries with LLM"""

@staticmethod
def get_classification_prompt(query: str) -> str:
"""Generate prompt for LLM-based query classification"""
examples = QueryClassificationExamples.get_classification_examples()

# Create few-shot examples string
examples_text = "\n".join(
[
f'Query: "{example["query"]}" -> Type: {example["type"]}'
for example in examples
]
)

return f"""You are an expert at classifying user queries into specific types for optimal response generation.

Query Types:
- factual: Requests for specific information, data, facts, definitions, or lists
- analytical: Requests for analysis, comparison, evaluation, or reasoning about relationships and causes
- summarization: Requests for summaries, overviews, main points, or condensed information
- general: Conversational queries, advice-seeking, explanations, or multi-faceted questions

Examples:
{examples_text}

Instructions:
- Analyze the user's intent and primary goal
- Consider the expected response format and depth
- Classify based on what type of processing would best serve the user
- Respond with only one word: factual, analytical, summarization, or general

Query: "{query}"
Type:"""

@staticmethod
def get_system_prompt(query_type: str) -> str:
"""Get system prompt based on query type"""

prompts = {
QueryType.GENERAL.value: """You are an advanced AI assistant specialized in document analysis and question answering.

Your primary responsibilities:
- Analyze the provided document context carefully
- Answer questions based ONLY on the information present in the context
- Maintain accuracy and avoid hallucination
- Provide structured, comprehensive responses
- Cite specific sections when making claims

Key principles:
- If information is not in the context, clearly state this limitation
- Use direct quotes from the context when appropriate
- Organize your response logically with clear reasoning
- Maintain professional tone while being accessible""",
QueryType.FACTUAL.value: """You are a precision-focused AI assistant for factual document analysis.

Your task is to extract and present factual information from documents with maximum accuracy:
- Only state facts that are explicitly mentioned in the context
- Use exact quotes when presenting specific data, numbers, or claims
- If asked about information not in the context, respond with "This information is not available in the provided document"
- Structure factual responses with clear categorization
- Distinguish between facts, opinions, and interpretations in the source
- For lists or itemized information, present them in a clear, organized format""",
QueryType.ANALYTICAL.value: """You are an analytical AI assistant specialized in document interpretation and analysis.

Your approach:
- Analyze the document context for patterns, themes, and key insights
- Synthesize information from multiple sections when relevant
- Provide reasoned interpretations based on the available evidence
- Highlight relationships between different parts of the document
- Distinguish between what the document states directly vs. what can be reasonably inferred
- Structure analytical responses with clear reasoning chains
- Consider multiple perspectives when the document presents them""",
QueryType.SUMMARIZATION.value: """You are an expert at document summarization and synthesis.

Your summarization strategy:
- Identify the main themes and key points from the context
- Organize information hierarchically (main points, supporting details)
- Preserve important nuances and qualifications
- Maintain the original document's tone and perspective
- Create coherent summaries that capture essential information
- Use bullet points or structured format when appropriate for clarity
- Ensure summaries are concise yet comprehensive""",
}

return prompts.get(query_type, prompts[QueryType.GENERAL.value])

@staticmethod
def format_user_prompt(question: str, context: str, query_type: str) -> str:
"""Format user prompt with context and question"""

prompt_formats = {
QueryType.FACTUAL.value: f"""**DOCUMENT CONTEXT:**
{context}

**FACTUAL QUERY:** {question}

**INSTRUCTIONS:** Extract and present only the factual information from the document that directly answers this question. Use exact quotes where appropriate and clearly indicate if the requested information is not available.""",
QueryType.ANALYTICAL.value: f"""**DOCUMENT CONTEXT:**
{context}

**ANALYTICAL QUERY:** {question}

**INSTRUCTIONS:** Analyze the document context to provide a comprehensive answer. Consider relationships between different parts of the document and provide reasoned interpretations based on the evidence presented.""",
QueryType.SUMMARIZATION.value: f"""**DOCUMENT CONTEXT:**
{context}

**SUMMARIZATION REQUEST:** {question}

**INSTRUCTIONS:** Create a well-structured summary addressing the request. Organize the information logically and maintain the document's key insights and perspective.""",
QueryType.GENERAL.value: f"""**DOCUMENT CONTEXT:**
{context}

**QUESTION:** {question}

**INSTRUCTIONS:** Based on the document context provided above, give a comprehensive and accurate answer to the question. If the context doesn't contain sufficient information, clearly explain what information is missing.""",
}

return prompt_formats.get(query_type, prompt_formats[QueryType.GENERAL.value])


class RAGOptimizer:
"""Advanced optimization techniques for RAG"""

@staticmethod
async def classify_query_with_llm(
question: str,
model_name: str,
classification_params: Dict[str, Any],
openai_client: AsyncOpenAI = None,
) -> str:
"""Use LLM to classify query type with few-shot examples"""
try:
classification_prompt = PromptTemplates.get_classification_prompt(question)

messages = [{"role": "user", "content": classification_prompt}]

response = await openai_client.chat.completions.create(
model=model_name, messages=messages, **classification_params
)

if response.choices and response.choices[0].message.content:
predicted_type = response.choices[0].message.content.strip().lower()

# Validate the predicted type
valid_types = [qt.value for qt in QueryType]
if predicted_type in valid_types:
logger.info(f"LLM classified query as: {predicted_type}")
return predicted_type
else:
logger.warning(f"LLM returned invalid query type: {predicted_type}")
return QueryType.GENERAL.value
else:
logger.error("LLM classification returned empty response")
return QueryType.GENERAL.value

except APIError as e:
logger.error(f"LLM query classification failed: {e}")
return QueryType.GENERAL.value

@staticmethod
async def detect_query_type(
question: str,
model_name: str = None,
config: RAGConfig = None,
openai_client: AsyncOpenAI = None,
) -> str:
"""
Detect query type using LLM classification
"""
# Try LLM-based classification if enabled and client is available
if (
config
and config.enable_llm_query_classification
and openai_client
and model_name
):
llm_classification = await RAGOptimizer.classify_query_with_llm(
question, model_name, config.validation_params, openai_client
)

return llm_classification

# If LLM classification failed, return General
return QueryType.GENERAL.value

@staticmethod
def chunk_optimization(
chunks: List[Dict[str, Any]], max_context_length: int = 4000
) -> tuple[List[Dict[str, Any]], str]:
"""Optimize chunk selection and formatting"""

if not chunks:
return [], ""

# Build context with length management
selected_chunks = []
context_parts = []
current_length = 0

for i, chunk in enumerate(chunks):
content = chunk.get("content", "")

# Format each chunk
chunk_header = f"--- Document Section {i + 1} (Relevance: {chunk.get('similarity_score', 0):.2f}) ---"
formatted_chunk = f"{chunk_header}\n{content}\n"

# Check length constraints
chunk_length = len(formatted_chunk)
if current_length + chunk_length > max_context_length:
break

selected_chunks.append(chunk)
context_parts.append(formatted_chunk)
current_length += chunk_length

return selected_chunks, "\n".join(context_parts)

@staticmethod
def post_process_llm_response(response: str) -> str:
"""Post-process LLM response for better formatting"""

# Remove only consecutive duplicate lines to avoid breaking formatted content.
lines = response.split("\n")
if not lines:
return ""

filtered_lines = [lines[0]]
for i in range(1, len(lines)):
current_line_stripped = lines[i].strip()
prev_line_stripped = lines[i - 1].strip()
if not current_line_stripped or current_line_stripped != prev_line_stripped:
filtered_lines.append(lines[i])

cleaned_response = "\n".join(filtered_lines)

# Ensure response ends properly
if cleaned_response and not cleaned_response.rstrip().endswith(
(".", "!", "?", ":")
):
cleaned_response = cleaned_response.rstrip() + "."

return cleaned_response.strip()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file has grown to nearly 600 lines and contains multiple distinct classes (EnhancedQueryValidator, RAGConfig, QueryClassificationExamples, PromptTemplates, RAGOptimizer). For better maintainability and separation of concerns, consider splitting this file into smaller, more focused modules. For example, prompt generation logic could go into a prompts.py module, and configuration could be in a config.py module.

Comment on lines +563 to +567
# Ensure response ends properly
if cleaned_response and not cleaned_response.rstrip().endswith(
(".", "!", "?", ":")
):
cleaned_response = cleaned_response.rstrip() + "."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The post_process_llm_response function unconditionally appends a period . to any response that doesn't end with standard punctuation. This could lead to incorrect formatting if the LLM returns a code snippet, a list, or any other formatted text that shouldn't end with a period. Consider making this behavior conditional or refining the logic to avoid altering non-prose responses.

Comment on lines +289 to +293
return ChatResponse(
response=f"""I'm sorry, but I couldn't process your query based on the documents in {chat_request.collection_name} that you want to query data from. {validation_error} Please provide a clear, specific question that I can help answer using the available documents in {chat_request.collection_name}.""",
relevant_chunks=[],
metadata=metadata_without_rag,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message returned when a query fails validation is quite long and repetitive, mentioning the collection name twice. This could be simplified for better readability and a cleaner user experience.

Suggested change
return ChatResponse(
response=f"""I'm sorry, but I couldn't process your query based on the documents in {chat_request.collection_name} that you want to query data from. {validation_error} Please provide a clear, specific question that I can help answer using the available documents in {chat_request.collection_name}.""",
relevant_chunks=[],
metadata=metadata_without_rag,
)
return ChatResponse(
response=f"I'm sorry, I couldn't process your query. {validation_error} Please provide a clear, specific question about the available documents.",
relevant_chunks=[],
metadata=metadata_without_rag,
)

Comment on lines +51 to +55
response = await client.post(url=f"{PDF_PROCESSOR_URL}/chat/",
json={"message": prompt,
"doc_ids": doc_ids,
"collection_name": "SemanticEmbeds"
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The collection_name is hardcoded as "SemanticEmbeds" in the client.post call, ignoring the collection_name parameter passed to the chat_with_rag function. To make the function more reusable, you should use the collection_name parameter in the request payload.

Suggested change
response = await client.post(url=f"{PDF_PROCESSOR_URL}/chat/",
json={"message": prompt,
"doc_ids": doc_ids,
"collection_name": "SemanticEmbeds"
})
response = await client.post(url=f"{PDF_PROCESSOR_URL}/chat/",
json={"message": prompt,
"doc_ids": doc_ids,
"collection_name": collection_name
})

Comment on lines +1 to +122
# Pre-staging environment overrides - minimal for disk pressure
replicaCount: 1

# Pre-staging LLM configuration - External vLLM service on VPN
llm:
baseUrl: "http://100.108.2.57/vllm_qwen2.5/v1"
model: "Qwen2.5-14B-Coder-Instruct" # Model name as served by external vLLM

# Pre-staging resources - minimal for disk pressure
resources:
limits:
cpu: 200m
memory: 256Mi
requests:
cpu: 50m
memory: 64Mi

# Disable autoscaling for prestaging due to disk pressure
autoscaling:
enabled: false

# Pre-staging secrets
secrets:
create: false
name: chat-service-secrets

# Pre-staging specific configurations - using local CRC registry
image:
repository: default-route-openshift-image-registry.apps-crc.testing/omnipdf/chat_service
tag: "dev-v0.0.3-5d69f89"
pullPolicy: Always # Use local registry image

# Environment variables
env:
- name: LOG_LEVEL
value: "INFO"
- name: ENVIRONMENT
value: "prestaging"
- name: ENABLE_METRICS
value: "true"

# Network policy disabled for pre-staging flexibility
networkPolicy:
enabled: true
ingress:
enabled: true
allowedCallers:
# PDF Processor Service - RAG conversations
- podSelector:
matchLabels:
app.kubernetes.io/name: pdf-processor-service
egress:
enabled: true
allowDNS: true
allowHTTPS: true # External vLLM API calls
allowHTTP: true # External vLLM API calls
allowedTargets:
# ChromaDB - query vectors
- podSelector:
matchLabels:
app.kubernetes.io/name: chromadb
ports: [8000]
# MinIO - future job status + file operations
- podSelector:
matchLabels:
app.kubernetes.io/name: minio
ports: [9000]
# Redis - future session management
- podSelector:
matchLabels:
app.kubernetes.io/name: redis
ports: [6379]

# Pod disruption budget for pre-staging (2 replicas, allow 1 down)
podDisruptionBudget:
enabled: true
minAvailable: 1

# Moderate deployment strategy for pre-staging (balanced speed/safety)
deploymentStrategy:
type: RollingUpdate
rollingUpdate:
minAvailable: 1
maxSurge: 1

# Moderate resource quota for pre-staging testing
serviceMonitor:
enabled: true
interval: "20s" # Between staging and prod intervals
scrapeTimeout: "8s" # Slightly shorter timeout for testing
labels:
environment: prestaging # Help identify prestaging metrics

# Add toleration for disk pressure to allow scheduling
tolerations:
- key: "node.kubernetes.io/disk-pressure"
operator: "Exists"
effect: "NoSchedule"

# Istio sidecar injection for service mesh
podAnnotations:
sidecar.istio.io/inject: "true"
sidecar.istio.io/proxyCPU: "15m"
sidecar.istio.io/proxyMemory: "128Mi"

# Service configuration
service:
type: ClusterIP
port: 8000

# ServiceAccount configuration
serviceAccount:
create: false
automount: false

# Ingress configuration
ingress:
enabled: false
className: ""
annotations: {}
hosts: []
tls: []

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The main values.yaml file contains values that are specific to a prestaging or development environment (e.g., image repository URLs pointing to a local CRC registry, specific image tags, and an IP-based URL for the LLM). Best practice for Helm charts is for the base values.yaml to contain generic, environment-agnostic defaults, with environment-specific overrides placed in separate files like values-prestaging.yaml. This makes the chart more reusable and easier to manage across different environments.

Comment on lines +1 to +23
from pydantic import BaseModel, Field
from typing import Optional, List, Dict, Any

SEMANTIC_EMBEDDING_COLLECTION = "SemanticEmbeds"
TEXTUAL_EMBEDDING_COLLECTION = "SentenceEmbeds"


class ChatRequest(BaseModel):
"""
Request model for chat API endpoints.
"""
message: str
doc_ids: list[str]
collection_name: Optional[str] = Field(default=SEMANTIC_EMBEDDING_COLLECTION, description="ChromaDB collection name")


class ChatResponse(BaseModel):
"""
Response model for chat API
"""
response: str
relevant_chunks: List[Dict[str, Any]] = Field(default_factory=list, description="Additional metadata about the RAG process")
metadata: Dict[str, Any] No newline at end of file

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The ChatRequest and ChatResponse Pydantic models defined here are duplicates of the models in chat_service/models/chat.py. This duplication can lead to inconsistencies and maintenance issues if the models diverge. Consider moving these shared models to a common library, such as the existing shared_utils directory, to ensure a single source of truth.

Comment on lines +12 to +37
CHAT_SERVICE_URL = os.environ["CHAT_SERVICE_URL"]


@router.post("/", status_code=201, response_model=ChatResponse)
async def handle_chat(
chat_request: ChatRequest,
session_id: str = Depends(get_session_id),
_valid_session: bool = Depends(validate_session_id),
session_storage: SessionStorage = Depends(get_session_storage)
):
"""
Handle chat requests with session validation and document access control.
If doc_id is provided, validates that the user has access to that document.
"""
for doc_id in chat_request.doc_ids:
validate_session_doc_pair(doc_id, session_id, session_storage, _valid_session)

logger.info(f"Processing chat request for session {session_id}")
logger.info(f"Query: {chat_request.message}")
logger.info(f"Document IDs: {chat_request.doc_ids}")
logger.info(f"Collection: {chat_request.collection_name}")

# Proxy request to chat service
chat_request_dict = chat_request.model_dump()
chat_request_dict['session_id'] = session_id
return await proxy_post(f"{CHAT_SERVICE_URL}/chat/", chat_request_dict)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The URL construction for proxying requests to the chat service is a bit confusing. The CHAT_SERVICE_URL environment variable includes a path segment (/chat), and then another /chat/ is appended at the call site. This makes the configuration brittle. It would be clearer to define CHAT_SERVICE_URL as just the base URL of the service (e.g., http://chat_service:8000) and construct the full path at the call site.

@NotYuSheng NotYuSheng deleted the revert-chat-removal-from-dev branch October 2, 2025 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant